ESQL: Skip nullifying aliases for Aggregate groups by bpintea · Pull Request #141340 · elastic/elasticsearch

bpintea · 2026-01-27T10:49:31Z

This re-introduces skipping the UnresolvedAttributes that are
collected from the Aggregate#aggregates and are Aliases in the
#groupings. In this case, the aliases, not their child, result in an
UnresolvedAttribute. This must not be nullified, since they'll be
subsequently resolved part of Alias'es child resolution.

The side effect of doing otherwise is that they can shadow attributes
produced by the source or Eval'd.

Related, in Aggregate resoluion skip resolving the aggregates based on
those input attributes that share a name with the not-yet-resolved
UnresolvedAttributes. This was incorrect, but didn't occur before
unmapped-fields fieature since the resolution of the Aggregate all
happened in one Analyzer cycle (unlike with unmapped-fields).

Another fix concerns skipping the right-hand side of Joins when
introducing null-aliases.
Also, null-aliases are no longer introduced behind Aggregates
if these shadow an existing source. (This is a moot change, since in
this case the plan verification would fail anyways. But it is the correct
way.)

This fixes the generation of name IDs for the attributes corresponding to the unmapped fields and are pushed to different branches in UntionAll. So far, one set of IDs was generated and reused for all subplans. This is now updated to own set per subplan. A minor collateral proposed change: the CSV spec-based tests skipped due to missing capabilities are now logged.

…_id_on_branches

This re-introduces skipping the `UnresolvedAttribute`s that are collected from the `Aggregate#aggregates` and are `Alias`es in the `#groupings`. In this case, the aliases, not their child, result in an `UnresolvedAttribute`. This must not be nullified, since they'll be subsequently resolved part of `Alias`'es child resolution. The side effect of doing otherwise is that they can shadow attributes produced by the source or `Eval`'d. Related, in Aggregate resoluion skip resolving the aggregates based on those input attributes that share a name with the not-yet-resolved `UnresolvedAttribute`s. This was incorrect, but didn't occur before unmapped-fields fieature since the resolution of the `Aggregate` all happened in one Analyzer cycle (unlike with unmapped-fields).

…uping_aliases

- LookupJoin no longer has Evals inserted on the right-hand side - nullifying now only iterates the plan once - make use of the new transformDownSkipBranch - avoid shadowing source attributes behind Aggregates - check resuling plan on statement analysis

…uping_aliases

elasticsearchmachine · 2026-01-30T06:10:17Z

Hi @bpintea, I've created a changelog YAML for you.

elasticsearchmachine · 2026-01-30T06:10:45Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

…uping_aliases

astefan

Left few comments. I still need to look over the tests in detail.

astefan · 2026-02-03T15:29:22Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

+     * @return the names of the aliases used in the grouping expressions of any Aggregate found in the plan.
+     */
+    private static Set<String> aliasNamesInAggregateGroupings(LogicalPlan plan) {
+        Set<String> aliasNames = new LinkedHashSet<>();


I don't think you need a LinkedHashSet here, unless I'm missing something. A simple HashSet should be enough.

astefan · 2026-02-03T15:32:14Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

     */
    private static List<UnresolvedAttribute> collectUnresolved(LogicalPlan plan) {
+        var aliasedGroupings = aliasNamesInAggregateGroupings(plan);
        List<UnresolvedAttribute> unresolved = new ArrayList<>();


Here you could also build a LinkedHashSet directly and not call unresolvedLinkedSet (which is used only once).

astefan · 2026-02-03T15:53:07Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

    }

+    private static List<Alias> removeShadowing(List<Alias> aliases, List<Attribute> exclude) {
+        Set<String> excludeNames = new HashSet<>(Expressions.names(exclude));


I am wondering, a wild thought, if the exclude list should skip synthetic attributes...

astefan · 2026-02-03T16:07:35Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

        var transformed = load ? load(plan, unresolvedLinkedSet) : nullify(plan, unresolvedLinkedSet);

-        return transformed.equals(plan) ? plan : refreshPlan(transformed, unresolved);
+        return transformed == plan ? plan : refreshPlan(transformed, unresolved);


Unrelated to this change, I found refreshUnresolved(LogicalPlan plan, List<UnresolvedAttribute> unresolved) method to be unnecessary, imho. I guess it boils down to one's style/preference, for me the code is too fragmented in few places with methods that are called only once. refreshPlan is imho better describing the logic if all the code is in that method.

astefan

Tests do also LGTM

astefan · 2026-02-04T10:34:06Z

...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java

+     *                  max(@timestamp){r}#90, language_name{f}#57, does_not_exist1{r}#109, $$does_not_exist1$converted_to$long{r$}#149,
+     *                  does_not_exist2{r}#154]]
+     *             \_Eval[[TOLONG(does_not_exist1{r}#109) AS $$does_not_exist1$converted_to$long#149]]
     *               \_Eval[[null[KEYWORD] AS languageName#89, null[DATETIME] AS max(@timestamp)#90]]


Unrelated to this PR.
We create null columns for those fields that are not common to subqueries, but we seem to pick one common data type for those fields that are common, but with different data types.

For example (from our own test data):

FROM (FROM sample_data metadata _index | STATS cnt = count(*) by _index, client_ip ) , (FROM sample_data_str metadata _index | STATS cnt = count(*) by _index, client_ip ) metadata _index

has results with

{ "name": "client_ip", "type": "keyword" }

But, if I use FROM sample_data, sample_data_str, I get

"name": "client_ip", "type": "unsupported", "original_types": [ "ip", "keyword" ], "suggested_cast": "keyword"

I don't remember the sub-queries functionality well enough, but it's a bit surprising (to me at least). Given that unmapped fields will treat an unexistent field as either NULL type or KEYWORD type, I am wondering if users wouldn't expect some kind of SET to act on the scenario above similarly to unmapped fields functionality behavior.

This is a nice catch.

@fang-xing-esql , @craigtaverner - we seem to be auto-casting here, but I recall that we wanted to not auto-cast in subqueries/views for now because that'd increase the scope?

I can reproduce Andrei's observation with the simpler

FROM (FROM sample_data), (FROM sample_data_str) | KEEP client_ip

However! The values for client_ip are still all null.

astefan · 2026-02-04T14:00:38Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

+                && (nAry instanceof Join == false || child == ((Join) nAry).left())) {
                assertSourceType(source);
+                var nullAliases = removeShadowing(nullAliases(unresolved), source.output());
                child = new Eval(source.source(), source, nullAliases);


Trying to break the logic here because there doesn't seem to be a way to protect the Eval constructor in case nullAliases is empty (removeShadowing could change its content) I encountered an issue with the following query:

from employees | eval does_not_exist = does_not_exist2 | mv_expand does_not_exist | keep does_not_exist*

which results in

"type": "illegal_state_exception", "reason": "Found 1 problem\nline 4:85: Plan [ProjectExec[[<no-fields>{r$}#81]]] optimized incorrectly due to missing references [<no-fields>{r$}#81]",

My initial attempt was with

from employees | eval does_not_exist = does_not_exist2 | mv_expand does_not_exist | keep does_not_exist* | stats count(*)

which returned 0 which is not correct.

Nice find. I think we can fix this before merging, and add a test for this case.

alex-spies

First pass, looking pretty good so far. Need another pass to see the tests.

@bpintea , I understand that you'll not be iterating on this, so please don't consider my comments as calls to action. I plan to address the comments myself, most likely, once I'm done reviewing.

alex-spies · 2026-02-12T10:53:22Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/unmapped-nullify.csv-spec

+FROM languages
+| STATS c = COUNT(*) BY language_code = does_not_exist


Let's add a version of this with just BY does_not_exist.

That's kinda technically already covered above, but mostly in queries that use ROW.

alex-spies · 2026-02-12T11:15:21Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java

                Holder<Boolean> changed = new Holder<>(false);
-                List<Attribute> resolvedList = NamedExpressions.mergeOutputAttributes(resolvedGroupings, childrenOutput);
+                var inputAttributes = new ArrayList<>(childrenOutput);
+                // remove input attributes with the same name as unresolved groupings: could be shadowed by not yet resolved renamed groups


Let's add an example to the comment, and mention that this only applies to the case of nullify/load.

I think an example is

ROW x = 1, language_code = 2 | STATS c = max(language_code) BY language_code = does_not_exist

once resolved, the language_code grouping will shadow the original language_code. When resolving max(language_code), we will accidentally have it use the language_code from the ROW because ResolveUnmapped didn't get to resolve BY language_code = does_not_exist, yet, which is of course wrong.

alex-spies · 2026-02-12T11:43:34Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

        var transformed = load ? load(plan, unresolvedLinkedSet) : nullify(plan, unresolvedLinkedSet);

-        return transformed.equals(plan) ? plan : refreshPlan(transformed, unresolved);
+        return transformed == plan ? plan : refreshPlan(transformed, unresolved);


Reverting this doesn't fail any AnalyzerUnmappedTests or AnalyzerTests or EsqlSpecITs.

@bpintea do we just try to avoid computing the expensive equality, or is there genuinely a case where equals doesn't notice a legitimate change in transformed?

do we just try to avoid computing the expensive equality

Yes. If a node in the tree is updated such that its new instance isn't .equal() to the old one, then all the parent nodes will be instantiated anew. And either nullifying or loading should produce such a non-equal child-node.
So an instance equality should suffice.

alex-spies · 2026-02-12T12:02:38Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

-                n -> {
-                    patched.set(true);
-                    return patchForkProject((Project) n);
+            var transformed = child.transformDownSkipBranch((n, skip) -> {


It looks like we should call fork.transformDownSkipBranch, not child.transformDownSkipBranch. The method transformDownSkipBranch already implements "for each branch, do this until we skip, then move to the next branch". Calling it on each child seems redundant.

Calling it on each child seems redundant.

~~But we want to skip only per branch, right, not per the entire Fork.~~
Edit: yes :)

alex-spies · 2026-02-12T13:12:51Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

@@ -167,12 +168,14 @@ private static Fork patchFork(Fork fork) {
     * by the evalUnresolvedAtopXXX methods and need to be "let through" the Project.
     */
    private static Project patchForkProject(Project project) {


thought: This whole method looks like we're treating the projects added by FORK like ResolvingProjects corresponding to KEEP field1, ..., fieldN, *. Maybe we could simplify things greatly if we actually went and turned FORK's Projects into such ResolvingProjects.

That might work.. though we need to make sure the outputs remain aligned (i.e. same names & types at same positions).

alex-spies · 2026-02-12T16:28:42Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

+                // skip right-sides of the Joins
+                && (nAry instanceof Join == false || child == ((Join) nAry).left())) {


This should fix some of the bugs with joins that we saw. Before merging, let's see if we can add spec tests with reproducers.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

alex-spies · 2026-02-12T16:30:44Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

+                && (nAry instanceof Join == false || child == ((Join) nAry).left())) {
                assertSourceType(source);
+                var nullAliases = removeShadowing(nullAliases(unresolved), source.output());
                child = new Eval(source.source(), source, nullAliases);


Nice find. I think we can fix this before merging, and add a test for this case.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/rules/ResolveUnmapped.java

bpintea · 2026-02-12T19:50:13Z

you'll not be iterating on this

Not too soon, unfortunatelly.

I plan to address the comments myself, most likely, once I'm done reviewing.

🙏

alex-spies

Nice, the change looks A-OK to me. Thanks @bpintea .

The LOOKUP JOIN related changes aren't tested, though. I'll go and add at least a spec test before merging so we have some coverage.

I'll address the review remarks before merging, too.

x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTestUtils.java

alex-spies · 2026-02-13T16:33:50Z

...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java

+     *                  max(@timestamp){r}#90, language_name{f}#57, does_not_exist1{r}#109, $$does_not_exist1$converted_to$long{r$}#149,
+     *                  does_not_exist2{r}#154]]
+     *             \_Eval[[TOLONG(does_not_exist1{r}#109) AS $$does_not_exist1$converted_to$long#149]]
     *               \_Eval[[null[KEYWORD] AS languageName#89, null[DATETIME] AS max(@timestamp)#90]]


This is a nice catch.

@fang-xing-esql , @craigtaverner - we seem to be auto-casting here, but I recall that we wanted to not auto-cast in subqueries/views for now because that'd increase the scope?

I can reproduce Andrei's observation with the simpler

FROM (FROM sample_data), (FROM sample_data_str) | KEEP client_ip

However! The values for client_ip are still all null.

alex-spies · 2026-02-13T16:36:53Z

...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java

     *                     | \_EsRelation[test][_meta_field{f}#51, emp_no{f}#45, first_name{f}#46, ..]
-     *                     \_Eval[[null[NULL] AS does_not_exist1#110, null[NULL] AS does_not_exist2#155]]
-     *                       \_EsRelation[languages_lookup][LOOKUP][language_code{f}#56, language_name{f}#57]
+     *                     \_EsRelation[languages_lookup][LOOKUP][language_code{f}#56, language_name{f}#57]


This looks correct, but it's not actually asserted in this test. (Nothing is, we need golden tests here as the test is too unwieldy otherwise.)

…uping_aliases

bpintea and others added 7 commits January 26, 2026 12:11

Update docs/changelog/141262.yaml

fd346a1

checkstyle

6453bfc

Merge remote-tracking branch 'upstream/main' into fix/unmapped_unique…

c213602

…_id_on_branches

review comments

0aae586

Merge remote-tracking branch 'upstream/main' into fix/unmapped_unique…

6b19262

…_id_on_branches

bpintea added >bug WIP :Analytics/ES|QL AKA ESQL v9.3.1 v9.4.0 labels Jan 27, 2026

bpintea added 5 commits January 27, 2026 12:23

guard new tests with new cap

1288039

guard new tests with new cap

6794694

Merge remote-tracking branch 'upstream/main' into fix/unmapped_no_gro…

a720868

…uping_aliases

Fixes

179b807

- LookupJoin no longer has Evals inserted on the right-hand side - nullifying now only iterates the plan once - make use of the new transformDownSkipBranch - avoid shadowing source attributes behind Aggregates - check resuling plan on statement analysis

Merge remote-tracking branch 'upstream/main' into fix/unmapped_no_gro…

1028271

…uping_aliases

bpintea added auto-backport Automatically create backport pull requests when merged and removed WIP labels Jan 30, 2026

bpintea requested review from GalLalouche, alex-spies and astefan and removed request for GalLalouche January 30, 2026 06:09

Update docs/changelog/141340.yaml

f0d20c8

bpintea marked this pull request as ready for review January 30, 2026 06:10

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 30, 2026

bpintea mentioned this pull request Jan 30, 2026

ESQL: support for mapping-unavailable fields meta-issue #138888

Open

63 tasks

This was referenced Jan 30, 2026

ESQL: introduce support for mapping-unavailable fields (Fork from #139417) #140463

Merged

ESQL: Fix injected attributes's IDs in UnionAll branches #141262

Merged

bpintea and others added 3 commits January 30, 2026 13:06

Use identity check instead of object's

feee104

Merge remote-tracking branch 'upstream/main' into fix/unmapped_no_gro…

65198a2

…uping_aliases

Small change to see if I can push

53fe729

astefan reviewed Feb 3, 2026

View reviewed changes

astefan approved these changes Feb 4, 2026

View reviewed changes

astefan reviewed Feb 4, 2026

View reviewed changes

alex-spies mentioned this pull request Feb 4, 2026

ESQL: bugs in unmapped_fields="nullify" from spec tests #141870

Open

alex-spies reviewed Feb 12, 2026

View reviewed changes

alex-spies approved these changes Feb 13, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix/unmapped_no_gro…

fae074c

…uping_aliases

alex-spies mentioned this pull request Feb 13, 2026

ESQL: inconsistent type resolution of mapping conflicts from subqueries #142499

Open

		FROM languages
		\| STATS c = COUNT(*) BY language_code = does_not_exist

		// skip right-sides of the Joins
		&& (nAry instanceof Join == false \|\| child == ((Join) nAry).left())) {

Conversation

bpintea commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 30, 2026

Uh oh!

elasticsearchmachine commented Jan 30, 2026

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bpintea commented Feb 12, 2026

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bpintea commented Jan 27, 2026 •

edited

Loading

bpintea Feb 12, 2026 •

edited

Loading